On the Linear Algebraic Structure of Distributed Word Representations

نویسنده

  • Lisa Seung-Yeon Lee
چکیده

In this work, we leverage the linear algebraic structure of distributed word representations to automatically extend knowledge bases and allow a machine to learn new facts about the world. Our goal is to extract structured facts from corpora in a simpler manner, without applying classifiers or patterns, and using only the co-occurrence statistics of words. We demonstrate that the linear algebraic structure of word embeddings can be used to reduce data requirements for methods of learning facts. In particular, we demonstrate that words belonging to a common category, or pairs of words satisfying a certain relation, form a low-rank subspace in the projected space. We compute a basis for this low-rank subspace using singular value decomposition (SVD), then use this basis to discover new facts and to fit vectors for less frequent words which we do not yet have vectors for. This thesis represents my own work in accordance with university regulations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Universal Investigation of $n$-representations of $n$-quivers

noindent We have two goals in this paper. First, we investigate and construct cofree coalgebras over $n$-representations of quivers, limits and colimits of $n$-representations of quivers, and limits and colimits of coalgebras in the monoidal categories of $n$-representations of quivers. Second, for any given quivers $mathit{Q}_1$,$mathit{Q}_2$,..., $mathit{Q}_n$, we construct a new quiver $math...

متن کامل

Deformation of Outer Representations of Galois Group

To a hyperbolic smooth curve defined over a number-field one naturally associates an "anabelian" representation of the absolute Galois group of the base field landing in outer automorphism group of the algebraic fundamental group. In this paper, we introduce several deformation problems for Lie-algebra versions of the above representation and show that, this way we get a richer structure than t...

متن کامل

Some algebraic properties of Lambert Multipliers on $L^2$ spaces

In this paper, we determine the structure of the space of multipliers of the range of a composition operator $C_varphi$ that induces by the conditional expectation between two $L^p(Sigma)$ spaces.

متن کامل

An Analysis of the RC4 Family of Stream Ciphers against Algebraic Attacks

To date, most applications of algebraic analysis and attacks on stream ciphers are on those based on linear feedback shift registers (LFSRs). In this paper, we extend algebraic analysis to non-LFSR based stream ciphers. Specifically, we perform an algebraic analysis on the RC4 family of stream ciphers, an example of stream ciphers based on dynamic tables, and investigate its implications to pot...

متن کامل

Unsupervised Text Normalization Using Distributed Representations of Words and Phrases

Text normalization techniques that use rule-based normalization or string similarity based on static dictionaries are typically unable to capture domain-specific abbreviations (custy, cx → customer) and shorthands (5ever, 7ever → forever) used in informal texts. In this work, we exploit the property that noisy and canonical forms of a particular word share similar context in a large noisy text ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1511.06961  شماره 

صفحات  -

تاریخ انتشار 2015